Overview

Dataset statistics

Number of variables12
Number of observations2260507
Missing cells1213442
Missing cells (%)4.5%
Duplicate rows5111
Duplicate rows (%)0.2%
Total size in memory207.0 MiB
Average record size in memory96.0 B

Variable types

Categorical5
Numeric7

Warnings

Dataset has 5111 (0.2%) duplicate rows Duplicates
Start_Time has a high cardinality: 2206603 distinct values High cardinality
Weather_Condition has a high cardinality: 116 distinct values High cardinality
Precipitation(in) has 1203775 (53.3%) missing values Missing
Precipitation(in) is highly skewed (γ1 = 57.07482798) Skewed
Start_Time is uniformly distributed Uniform
Wind_Speed(mph) has 158060 (7.0%) zeros Zeros
Precipitation(in) has 886656 (39.2%) zeros Zeros

Reproduction

Analysis started2021-05-14 14:08:40.814660
Analysis finished2021-05-14 14:12:06.530764
Duration3 minutes and 25.72 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

State
Categorical

Distinct49
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size17.2 MiB
CA
380086 
TX
263977 
FL
194634 
SC
158916 
NC
123051 
Other values (44)
1139843 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters4521014
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOH
2nd rowOH
3rd rowOH
4th rowOH
5th rowOH
ValueCountFrequency (%)
CA380086
16.8%
TX263977
 
11.7%
FL194634
 
8.6%
SC158916
 
7.0%
NC123051
 
5.4%
NY113943
 
5.0%
PA76754
 
3.4%
MI72226
 
3.2%
IL64985
 
2.9%
GA64630
 
2.9%
Other values (39)747305
33.1%
2021-05-14T16:12:06.974192image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ca380086
16.8%
tx263977
 
11.7%
fl194634
 
8.6%
sc158916
 
7.0%
nc123051
 
5.4%
ny113943
 
5.0%
pa76754
 
3.4%
mi72226
 
3.2%
il64985
 
2.9%
ga64630
 
2.9%
Other values (39)747305
33.1%

Most occurring characters

ValueCountFrequency (%)
A821981
18.2%
C694125
15.4%
N430000
9.5%
L358901
7.9%
T347709
7.7%
X263977
 
5.8%
M204829
 
4.5%
F194634
 
4.3%
I190039
 
4.2%
S166809
 
3.7%
Other values (14)848010
18.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter4521014
100.0%

Most frequent character per category

ValueCountFrequency (%)
A821981
18.2%
C694125
15.4%
N430000
9.5%
L358901
7.9%
T347709
7.7%
X263977
 
5.8%
M204829
 
4.5%
F194634
 
4.3%
I190039
 
4.2%
S166809
 
3.7%
Other values (14)848010
18.8%

Most occurring scripts

ValueCountFrequency (%)
Latin4521014
100.0%

Most frequent character per script

ValueCountFrequency (%)
A821981
18.2%
C694125
15.4%
N430000
9.5%
L358901
7.9%
T347709
7.7%
X263977
 
5.8%
M204829
 
4.5%
F194634
 
4.3%
I190039
 
4.2%
S166809
 
3.7%
Other values (14)848010
18.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII4521014
100.0%

Most frequent character per block

ValueCountFrequency (%)
A821981
18.2%
C694125
15.4%
N430000
9.5%
L358901
7.9%
T347709
7.7%
X263977
 
5.8%
M204829
 
4.5%
F194634
 
4.3%
I190039
 
4.2%
S166809
 
3.7%
Other values (14)848010
18.8%

Start_Time
Categorical

HIGH CARDINALITY
UNIFORM

Distinct2206603
Distinct (%)97.6%
Missing0
Missing (%)0.0%
Memory size17.2 MiB
2018-11-25 01:22:49
 
31
2018-11-12 00:37:27
 
27
2016-04-10 08:59:26
 
27
2017-09-09 09:03:14
 
23
2017-09-06 15:52:36
 
22
Other values (2206598)
2260377 

Length

Max length19
Median length19
Mean length19
Min length19

Characters and Unicode

Total characters42949633
Distinct characters13
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2158939 ?
Unique (%)95.5%

Sample

1st row2016-02-08 06:49:27
2nd row2016-02-08 07:23:34
3rd row2016-02-08 07:39:07
4th row2016-02-08 07:44:26
5th row2016-02-08 07:59:35
ValueCountFrequency (%)
2018-11-25 01:22:4931
 
< 0.1%
2018-11-12 00:37:2727
 
< 0.1%
2016-04-10 08:59:2627
 
< 0.1%
2017-09-09 09:03:1423
 
< 0.1%
2017-09-06 15:52:3622
 
< 0.1%
2019-12-17 06:32:1122
 
< 0.1%
2016-06-12 10:07:3722
 
< 0.1%
2018-03-28 02:09:1521
 
< 0.1%
2016-05-21 08:30:4221
 
< 0.1%
2020-02-13 06:52:3820
 
< 0.1%
Other values (2206593)2260271
> 99.9%
2021-05-14T16:12:20.986485image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2019-11-152882
 
0.1%
2019-11-222829
 
0.1%
2019-11-142791
 
0.1%
2019-11-122782
 
0.1%
2018-11-092777
 
0.1%
2018-11-062724
 
0.1%
2019-10-162638
 
0.1%
2019-11-202633
 
0.1%
2019-11-212626
 
0.1%
2018-11-022596
 
0.1%
Other values (86571)4493736
99.4%

Most occurring characters

ValueCountFrequency (%)
07626939
17.8%
16561039
15.3%
25495995
12.8%
-4521014
10.5%
:4521014
10.5%
2260507
 
5.3%
31835466
 
4.3%
81825787
 
4.3%
51798008
 
4.2%
41760142
 
4.1%
Other values (3)4743722
11.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number31647098
73.7%
Dash Punctuation4521014
 
10.5%
Other Punctuation4521014
 
10.5%
Space Separator2260507
 
5.3%

Most frequent character per category

ValueCountFrequency (%)
07626939
24.1%
16561039
20.7%
25495995
17.4%
31835466
 
5.8%
81825787
 
5.8%
51798008
 
5.7%
41760142
 
5.6%
91703715
 
5.4%
71671164
 
5.3%
61368843
 
4.3%
ValueCountFrequency (%)
-4521014
100.0%
ValueCountFrequency (%)
2260507
100.0%
ValueCountFrequency (%)
:4521014
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common42949633
100.0%

Most frequent character per script

ValueCountFrequency (%)
07626939
17.8%
16561039
15.3%
25495995
12.8%
-4521014
10.5%
:4521014
10.5%
2260507
 
5.3%
31835466
 
4.3%
81825787
 
4.3%
51798008
 
4.2%
41760142
 
4.1%
Other values (3)4743722
11.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII42949633
100.0%

Most frequent character per block

ValueCountFrequency (%)
07626939
17.8%
16561039
15.3%
25495995
12.8%
-4521014
10.5%
:4521014
10.5%
2260507
 
5.3%
31835466
 
4.3%
81825787
 
4.3%
51798008
 
4.2%
41760142
 
4.1%
Other values (3)4743722
11.0%

Severity
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size17.2 MiB
2
1505417 
3
745877 
4
 
8267
1
 
946

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2260507
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row3
3rd row2
4th row3
5th row2
ValueCountFrequency (%)
21505417
66.6%
3745877
33.0%
48267
 
0.4%
1946
 
< 0.1%
2021-05-14T16:12:21.427226image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-05-14T16:12:21.552221image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
21505417
66.6%
3745877
33.0%
48267
 
0.4%
1946
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
21505417
66.6%
3745877
33.0%
48267
 
0.4%
1946
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2260507
100.0%

Most frequent character per category

ValueCountFrequency (%)
21505417
66.6%
3745877
33.0%
48267
 
0.4%
1946
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common2260507
100.0%

Most frequent character per script

ValueCountFrequency (%)
21505417
66.6%
3745877
33.0%
48267
 
0.4%
1946
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII2260507
100.0%

Most frequent character per block

ValueCountFrequency (%)
21505417
66.6%
3745877
33.0%
48267
 
0.4%
1946
 
< 0.1%

Start_Lng
Real number (ℝ)

Distinct708230
Distinct (%)31.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-92.61955627
Minimum-124.623833
Maximum-67.839745
Zeros0
Zeros (%)0.0%
Memory size17.2 MiB
2021-05-14T16:12:22.164175image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-124.623833
5-th percentile-122.086945
Q1-97.785866
median-86.780106
Q3-80.818542
95-th percentile-73.8449961
Maximum-67.839745
Range56.784088
Interquartile range (IQR)16.967324

Descriptive statistics

Standard deviation15.87602625
Coefficient of variation (CV)-0.1714111672
Kurtosis-0.8004525872
Mean-92.61955627
Median Absolute Deviation (MAD)8.790725
Skewness-0.740033074
Sum-209367155.3
Variance252.0482095
MonotocityNot monotonic
2021-05-14T16:12:22.383461image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-84.390343482
 
< 0.1%
-83.111794472
 
< 0.1%
-122.366852458
 
< 0.1%
-82.259857439
 
< 0.1%
-118.096634414
 
< 0.1%
-83.058128409
 
< 0.1%
-80.204353381
 
< 0.1%
-93.26992373
 
< 0.1%
-82.2603367
 
< 0.1%
-118.368263354
 
< 0.1%
Other values (708220)2256358
99.8%
ValueCountFrequency (%)
-124.6238331
< 0.1%
-124.5344391
< 0.1%
-124.4844211
< 0.1%
-124.4791791
< 0.1%
-124.4791561
< 0.1%
ValueCountFrequency (%)
-67.8397451
< 0.1%
-67.8418581
< 0.1%
-68.0601651
< 0.1%
-68.140031
< 0.1%
-68.3808521
< 0.1%

Start_Lat
Real number (ℝ≥0)

Distinct743384
Distinct (%)32.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36.07856652
Minimum24.555269
Maximum49.002201
Zeros0
Zeros (%)0.0%
Memory size17.2 MiB
2021-05-14T16:12:23.064766image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum24.555269
5-th percentile28.12026
Q132.92453
median35.391476
Q340.068172
95-th percentile43.2429418
Maximum49.002201
Range24.446932
Interquartile range (IQR)7.143642

Descriptive statistics

Standard deviation4.932324556
Coefficient of variation (CV)0.1367106576
Kurtosis-0.6066374191
Mean36.07856652
Median Absolute Deviation (MAD)3.653068
Skewness0.08649421934
Sum81555852.18
Variance24.32782552
MonotocityNot monotonic
2021-05-14T16:12:23.314756image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
33.744976483
 
< 0.1%
42.476501472
 
< 0.1%
37.808498452
 
< 0.1%
34.858925438
 
< 0.1%
33.941364416
 
< 0.1%
42.368423408
 
< 0.1%
25.789072380
 
< 0.1%
44.966118372
 
< 0.1%
34.858795366
 
< 0.1%
34.833031349
 
< 0.1%
Other values (743374)2256371
99.8%
ValueCountFrequency (%)
24.5552691
< 0.1%
24.55741
< 0.1%
24.559871
< 0.1%
24.5602461
< 0.1%
24.5606881
< 0.1%
ValueCountFrequency (%)
49.0022011
 
< 0.1%
49.0007591
 
< 0.1%
48.9999011
 
< 0.1%
48.9995691
 
< 0.1%
48.9982414
< 0.1%

Humidity(%)
Real number (ℝ≥0)

Distinct100
Distinct (%)< 0.1%
Missing2489
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean65.75183546
Minimum1
Maximum100
Zeros0
Zeros (%)0.0%
Memory size17.2 MiB
2021-05-14T16:12:23.580006image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile27
Q150
median68
Q384
95-th percentile97
Maximum100
Range99
Interquartile range (IQR)34

Descriptive statistics

Standard deviation22.05538544
Coefficient of variation (CV)0.3354337607
Kurtosis-0.6871174464
Mean65.75183546
Median Absolute Deviation (MAD)17
Skewness-0.3883801916
Sum148468828
Variance486.440027
MonotocityNot monotonic
2021-05-14T16:12:23.814371image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9386908
 
3.8%
10085379
 
3.8%
9056706
 
2.5%
8754162
 
2.4%
9642259
 
1.9%
8440905
 
1.8%
8940272
 
1.8%
9439609
 
1.8%
8138340
 
1.7%
8237730
 
1.7%
Other values (90)1735748
76.8%
ValueCountFrequency (%)
11
 
< 0.1%
29
 
< 0.1%
345
 
< 0.1%
4465
< 0.1%
5985
< 0.1%
ValueCountFrequency (%)
10085379
3.8%
992893
 
0.1%
981699
 
0.1%
9729125
 
1.3%
9642259
1.9%

Pressure(in)
Real number (ℝ≥0)

Distinct455
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29.81763357
Minimum26.51
Maximum31.15
Zeros0
Zeros (%)0.0%
Memory size17.2 MiB
2021-05-14T16:12:24.050184image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum26.51
5-th percentile28.93
Q129.7
median29.94
Q330.08
95-th percentile30.32
Maximum31.15
Range4.64
Interquartile range (IQR)0.38

Descriptive statistics

Standard deviation0.451540204
Coefficient of variation (CV)0.01514339503
Kurtosis5.726885162
Mean29.81763357
Median Absolute Deviation (MAD)0.17
Skewness-1.859723004
Sum67402969.41
Variance0.2038885558
MonotocityNot monotonic
2021-05-14T16:12:24.284551image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
29.9645266
 
2.0%
30.0144905
 
2.0%
29.9944680
 
2.0%
29.9443410
 
1.9%
30.0442473
 
1.9%
30.0640576
 
1.8%
29.9140384
 
1.8%
30.0338301
 
1.7%
29.9738180
 
1.7%
29.9838060
 
1.7%
Other values (445)1844272
81.6%
ValueCountFrequency (%)
26.518
< 0.1%
26.5211
< 0.1%
26.5318
< 0.1%
26.5411
< 0.1%
26.5514
< 0.1%
ValueCountFrequency (%)
31.151
 
< 0.1%
31.122
< 0.1%
31.11
 
< 0.1%
31.083
< 0.1%
31.032
< 0.1%

Temperature(F)
Real number (ℝ)

Distinct791
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean63.17708235
Minimum-33
Maximum129.2
Zeros432
Zeros (%)< 0.1%
Memory size17.2 MiB
2021-05-14T16:12:24.518961image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-33
5-th percentile30
Q151.1
median66
Q377
95-th percentile89.6
Maximum129.2
Range162.2
Interquartile range (IQR)25.9

Descriptive statistics

Standard deviation18.64594592
Coefficient of variation (CV)0.295137813
Kurtosis-0.008301761303
Mean63.17708235
Median Absolute Deviation (MAD)12.9
Skewness-0.5398998588
Sum142812236.9
Variance347.6712992
MonotocityNot monotonic
2021-05-14T16:12:24.741789image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7754259
 
2.4%
6850786
 
2.2%
7349213
 
2.2%
7546129
 
2.0%
7245406
 
2.0%
7044082
 
2.0%
5943525
 
1.9%
7942277
 
1.9%
6341238
 
1.8%
6440816
 
1.8%
Other values (781)1802776
79.8%
ValueCountFrequency (%)
-331
 
< 0.1%
-292
 
< 0.1%
-27.910
< 0.1%
-27.42
 
< 0.1%
-278
< 0.1%
ValueCountFrequency (%)
129.22
 
< 0.1%
1271
 
< 0.1%
123.81
 
< 0.1%
1232
 
< 0.1%
1225
< 0.1%

Wind_Direction
Categorical

Distinct18
Distinct (%)< 0.1%
Missing18
Missing (%)< 0.1%
Memory size17.2 MiB
S
211943 
W
178849 
CALM
158059 
N
151502 
SSW
 
133831
Other values (13)
1426305 

Length

Max length4
Median length3
Mean length2.288286296
Min length1

Characters and Unicode

Total characters5172646
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSW
2nd rowSW
3rd rowSW
4th rowSSW
5th rowWSW
ValueCountFrequency (%)
S211943
 
9.4%
W178849
 
7.9%
CALM158059
 
7.0%
N151502
 
6.7%
SSW133831
 
5.9%
VAR133116
 
5.9%
SW127078
 
5.6%
E123720
 
5.5%
SSE122850
 
5.4%
WNW121172
 
5.4%
Other values (8)798369
35.3%
2021-05-14T16:12:25.244141image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
s211943
 
9.4%
w178849
 
7.9%
calm158059
 
7.0%
n151502
 
6.7%
ssw133831
 
5.9%
var133116
 
5.9%
sw127078
 
5.6%
e123720
 
5.5%
sse122850
 
5.4%
wnw121172
 
5.4%
Other values (8)798369
35.3%

Most occurring characters

ValueCountFrequency (%)
S1152969
22.3%
W1139061
22.0%
N970519
18.8%
E878513
17.0%
A291175
 
5.6%
C158059
 
3.1%
L158059
 
3.1%
M158059
 
3.1%
V133116
 
2.6%
R133116
 
2.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter5172646
100.0%

Most frequent character per category

ValueCountFrequency (%)
S1152969
22.3%
W1139061
22.0%
N970519
18.8%
E878513
17.0%
A291175
 
5.6%
C158059
 
3.1%
L158059
 
3.1%
M158059
 
3.1%
V133116
 
2.6%
R133116
 
2.6%

Most occurring scripts

ValueCountFrequency (%)
Latin5172646
100.0%

Most frequent character per script

ValueCountFrequency (%)
S1152969
22.3%
W1139061
22.0%
N970519
18.8%
E878513
17.0%
A291175
 
5.6%
C158059
 
3.1%
L158059
 
3.1%
M158059
 
3.1%
V133116
 
2.6%
R133116
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII5172646
100.0%

Most frequent character per block

ValueCountFrequency (%)
S1152969
22.3%
W1139061
22.0%
N970519
18.8%
E878513
17.0%
A291175
 
5.6%
C158059
 
3.1%
L158059
 
3.1%
M158059
 
3.1%
V133116
 
2.6%
R133116
 
2.6%

Wind_Speed(mph)
Real number (ℝ≥0)

ZEROS

Distinct125
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.098481978
Minimum0
Maximum175
Zeros158060
Zeros (%)7.0%
Memory size17.2 MiB
2021-05-14T16:12:25.447259image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q14.6
median7
Q310.4
95-th percentile17
Maximum175
Range175
Interquartile range (IQR)5.8

Descriptive statistics

Standard deviation4.869992634
Coefficient of variation (CV)0.6013463569
Kurtosis10.02188099
Mean8.098481978
Median Absolute Deviation (MAD)2.4
Skewness1.167952981
Sum18306675.2
Variance23.71682825
MonotocityNot monotonic
2021-05-14T16:12:25.688035image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4.6167730
 
7.4%
5.8165738
 
7.3%
0158060
 
7.0%
3.5156734
 
6.9%
6.9153797
 
6.8%
8.1137381
 
6.1%
9.2121718
 
5.4%
10.499603
 
4.4%
594043
 
4.2%
691210
 
4.0%
Other values (115)914493
40.5%
ValueCountFrequency (%)
0158060
7.0%
172
 
< 0.1%
1.2319
 
< 0.1%
2155
 
< 0.1%
2.3638
 
< 0.1%
ValueCountFrequency (%)
1753
< 0.1%
174.91
 
< 0.1%
162.32
< 0.1%
1611
 
< 0.1%
1571
 
< 0.1%

Precipitation(in)
Real number (ℝ≥0)

MISSING
SKEWED
ZEROS

Distinct251
Distinct (%)< 0.1%
Missing1203775
Missing (%)53.3%
Infinite0
Infinite (%)0.0%
Mean0.01473804143
Minimum0
Maximum25
Zeros886656
Zeros (%)39.2%
Memory size17.2 MiB
2021-05-14T16:12:25.922402image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0.07
Maximum25
Range25
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.1540297482
Coefficient of variation (CV)10.45116808
Kurtosis4030.039281
Mean0.01473804143
Median Absolute Deviation (MAD)0
Skewness57.07482798
Sum15574.16
Variance0.02372516334
MonotocityNot monotonic
2021-05-14T16:12:26.173205image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0886656
39.2%
0.0147242
 
2.1%
0.0223404
 
1.0%
0.0316045
 
0.7%
0.0411781
 
0.5%
0.059587
 
0.4%
0.067637
 
0.3%
0.076287
 
0.3%
0.085020
 
0.2%
0.094491
 
0.2%
Other values (241)38582
 
1.7%
(Missing)1203775
53.3%
ValueCountFrequency (%)
0886656
39.2%
0.0147242
 
2.1%
0.0223404
 
1.0%
0.0316045
 
0.7%
0.0411781
 
0.5%
ValueCountFrequency (%)
251
< 0.1%
10.81
< 0.1%
10.142
< 0.1%
10.131
< 0.1%
10.111
< 0.1%

Weather_Condition
Categorical

HIGH CARDINALITY

Distinct116
Distinct (%)< 0.1%
Missing7160
Missing (%)0.3%
Memory size17.2 MiB
Clear
470336 
Fair
395893 
Mostly Cloudy
332231 
Overcast
245635 
Partly Cloudy
228108 
Other values (111)
581144 

Length

Max length35
Median length8
Mean length8.451608651
Min length3

Characters and Unicode

Total characters19044407
Distinct characters45
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)< 0.1%

Sample

1st rowOvercast
2nd rowMostly Cloudy
3rd rowMostly Cloudy
4th rowLight Rain
5th rowOvercast
ValueCountFrequency (%)
Clear470336
20.8%
Fair395893
17.5%
Mostly Cloudy332231
14.7%
Overcast245635
10.9%
Partly Cloudy228108
10.1%
Cloudy151073
 
6.7%
Scattered Clouds133415
 
5.9%
Light Rain120933
 
5.3%
Light Snow30993
 
1.4%
Rain27883
 
1.2%
Other values (106)116847
 
5.2%
2021-05-14T16:12:26.691465image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
cloudy718172
22.4%
clear470336
14.7%
fair400038
12.5%
mostly334760
10.4%
overcast245635
 
7.7%
partly229596
 
7.2%
rain173341
 
5.4%
light172045
 
5.4%
scattered133415
 
4.2%
clouds133415
 
4.2%
Other values (46)194758
 
6.1%

Most occurring characters

ValueCountFrequency (%)
l1900400
 
10.0%
a1699260
 
8.9%
r1528578
 
8.0%
C1321934
 
6.9%
y1317857
 
6.9%
t1278045
 
6.7%
o1267582
 
6.7%
e1064542
 
5.6%
d1024699
 
5.4%
952164
 
5.0%
Other values (35)5689346
29.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter14892062
78.2%
Uppercase Letter3179695
 
16.7%
Space Separator952164
 
5.0%
Other Punctuation14752
 
0.1%
Dash Punctuation5734
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
l1900400
12.8%
a1699260
11.4%
r1528578
10.3%
y1317857
8.8%
t1278045
8.6%
o1267582
8.5%
e1064542
7.1%
d1024699
6.9%
u869742
5.8%
i793586
 
5.3%
Other values (14)2147771
14.4%
ValueCountFrequency (%)
C1321934
41.6%
F425068
 
13.4%
M337152
 
10.6%
O245635
 
7.7%
P231433
 
7.3%
S180301
 
5.7%
R173341
 
5.5%
L172049
 
5.4%
H37421
 
1.2%
T23668
 
0.7%
Other values (8)31693
 
1.0%
ValueCountFrequency (%)
952164
100.0%
ValueCountFrequency (%)
/14752
100.0%
ValueCountFrequency (%)
-5734
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin18071757
94.9%
Common972650
 
5.1%

Most frequent character per script

ValueCountFrequency (%)
l1900400
 
10.5%
a1699260
 
9.4%
r1528578
 
8.5%
C1321934
 
7.3%
y1317857
 
7.3%
t1278045
 
7.1%
o1267582
 
7.0%
e1064542
 
5.9%
d1024699
 
5.7%
u869742
 
4.8%
Other values (32)4799118
26.6%
ValueCountFrequency (%)
952164
97.9%
/14752
 
1.5%
-5734
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII19044407
100.0%

Most frequent character per block

ValueCountFrequency (%)
l1900400
 
10.0%
a1699260
 
8.9%
r1528578
 
8.0%
C1321934
 
6.9%
y1317857
 
6.9%
t1278045
 
6.7%
o1267582
 
6.7%
e1064542
 
5.6%
d1024699
 
5.4%
952164
 
5.0%
Other values (35)5689346
29.9%

Interactions

2021-05-14T16:11:08.845253image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:09.992133image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:11.132642image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:12.280945image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:13.411570image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:14.027377image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:15.158532image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:16.289793image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:17.404223image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:18.539511image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:19.672795image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:20.286764image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:21.411680image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:22.552415image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:23.687419image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:24.837128image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:25.970387image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:26.590626image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:27.704861image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:28.832861image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:29.981779image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:31.181288image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:32.316670image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:32.928691image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:34.042586image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:35.226860image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:36.528265image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:37.745788image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:39.037224image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:39.688374image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:40.925285image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:42.157351image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:43.396808image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:44.622303image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:45.774376image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:46.400532image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:47.059108image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:47.734294image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:48.405056image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:49.071125image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:49.715062image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-14T16:11:50.373838image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-05-14T16:12:26.878961image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-05-14T16:12:27.194689image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-05-14T16:12:27.586797image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-05-14T16:12:27.901253image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-05-14T16:12:28.208407image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-05-14T16:11:54.555097image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-05-14T16:11:57.039438image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-05-14T16:12:02.175431image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-05-14T16:12:03.071579image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

StateStart_TimeSeverityStart_LngStart_LatHumidity(%)Pressure(in)Temperature(F)Wind_DirectionWind_Speed(mph)Precipitation(in)Weather_Condition
0OH2016-02-08 06:49:272-84.03260839.063148100.029.6736.0SW3.5NaNOvercast
1OH2016-02-08 07:23:343-84.20558239.74775396.029.6435.1SW4.6NaNMostly Cloudy
2OH2016-02-08 07:39:072-84.18835439.62778189.029.6536.0SW3.5NaNMostly Cloudy
3OH2016-02-08 07:44:263-82.92519440.10059097.029.6337.9SSW3.50.03Light Rain
4OH2016-02-08 07:59:352-84.23050739.758274100.029.6634.0WSW3.5NaNOvercast
5OH2016-02-08 07:59:583-84.19490139.770382100.029.6634.0WSW3.5NaNOvercast
6OH2016-02-08 08:00:402-84.17200539.77806199.029.6733.3SW1.2NaNMostly Cloudy
7OH2016-02-08 08:10:043-82.92519440.100590100.029.6237.4SSW4.60.02Light Rain
8OH2016-02-08 08:14:423-83.11929339.95281293.029.6435.6WNW5.8NaNRain
9OH2016-02-08 08:21:273-82.83091039.932709100.029.6237.4SSW4.60.02Light Rain

Last rows

StateStart_TimeSeverityStart_LngStart_LatHumidity(%)Pressure(in)Temperature(F)Wind_DirectionWind_Speed(mph)Precipitation(in)Weather_Condition
2260497CA2017-08-30 17:32:093-118.04683733.77468547.029.6887.6S5.8NaNPartly Cloudy
2260498CA2017-08-30 17:31:392-117.90678433.85393934.029.6693.0VAR3.5NaNClear
2260499CA2017-08-30 17:54:402-118.23326934.07383040.029.6788.0VAR4.6NaNClear
2260500CA2017-08-30 18:04:193-117.93838534.07235027.029.6998.6SSW6.9NaNPartly Cloudy
2260501CA2017-08-30 18:28:482-118.53598834.17316118.029.66100.0WNW4.6NaNClear
2260502CA2017-08-30 18:41:303-118.62393234.49580818.028.85100.0WNW5.00.0Fair
2260503CA2017-08-30 18:59:023-118.43372334.03132264.029.6977.0SSW5.8NaNClear
2260504CA2017-08-30 18:57:523-117.36910234.10678516.029.73102.2SSW5.8NaNHaze
2260505CA2017-08-30 19:49:013-118.10398133.92468639.029.6888.0W3.5NaNClear
2260506CA2017-08-30 20:17:212-117.39735433.72946940.029.7889.6S3.5NaNClear

Duplicate rows

Most frequent

StateStart_TimeSeverityStart_LngStart_LatHumidity(%)Pressure(in)Temperature(F)Wind_DirectionWind_Speed(mph)Precipitation(in)Weather_Conditioncount
790FL2020-11-12 06:28:163-80.18767525.942879100.029.9178.0S5.00.00Partly Cloudy11
2152SC2018-09-16 13:24:133-81.19508433.978249100.029.7475.0SE13.80.01Rain11
890GA2020-03-12 22:33:353-84.51314533.58502665.028.8769.0SSW9.00.00Partly Cloudy10
2151SC2018-09-16 13:24:123-81.19508433.978249100.029.7475.0SE13.80.01Rain8
2601TX2019-09-16 15:09:553-96.89727832.90765034.029.4695.0E9.00.00Partly Cloudy7
3027WA2020-08-14 07:43:262-117.46761347.67351245.027.6756.0SE8.00.00Fair7
1425MO2020-10-27 06:54:072-90.28404238.71340986.029.6537.0NNE7.00.00Light Rain5
1404MO2020-04-10 19:53:373-94.52957938.84386837.028.7552.0SSE9.00.00Cloudy4
2284SC2020-04-03 16:13:522-79.01642633.97431623.029.8272.0W6.00.00Fair4
2738TX2020-06-09 09:38:203-95.26544229.76817569.029.7387.0SSW9.00.00Fair4